-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
COMPAT: Use actual isinstance check for pyarrow.Array instead of hasattr(.. 'type') duck typing #52830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMPAT: Use actual isinstance check for pyarrow.Array instead of hasattr(.. 'type') duck typing #52830
Conversation
…type') duck typing
seems reasonable |
In #52833 I need a similar check, but in cython. So if we do that, we can probably only define it in cython instead of in |
Moved it to cython given that we will likely need pyarrow in lib.pyx anyway (eg needed for #52833 as well) |
pandas/_typing.py
Outdated
@@ -93,10 +93,21 @@ | |||
from typing import Self | |||
else: | |||
from typing_extensions import Self # pyright: reportUnusedImport = false | |||
|
|||
try: | |||
from pyarrow import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC we expect optional dependencies to be present for mypy checks, but not sure. who would know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's also fine with me, that would just simplify this a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this solves the problem Brock and I ran into in the other PR, this leaves the wider problem unresolved.
There are a bunch of things we might want to type check for (objects are only one part, but there are also arrow specific dtype checks). Do you suggest to create a helper function for every check that we might need at one point? This would get convoluted very quick
Can you be more specific? Which other specific checks are you thinking about? (beyond one of the main pyarrow classes) |
Possibly pyarrow Table and Scalars too. There might be a use case for checking for specific Scalar or Array subclasses too? |
I was referring to checks like, checking for an array/scalar and afterwards checking for a specific pyarrow type (int, ...). This would require helper functions for all checks we might want |
IIUC most of these would be relevant when we already have an ArrowExtensionArray and want to check if a scalar can be held in an array of that dtype, e.g. for setitem-with-expansion. Most of our EAs have something like this, either _validate_scalar or _validate_setitem_value. I think the way forward is to standardize this, so the relevant check would go on ArrowExtensionArray and we wouldn't need to worry about these helpers |
@phofl OK to move forward here (pending green)? |
Yep |
@jorisvandenbossche can you merge main and ping on green |
@jbrockmendel mypy is not happy with my attempt to add a type guard:
(and I can't directly seem to fix it) OK with leaving the typing part out for a follow-up PR? (I can directly open one with what I tried here, so we can discuss the best approach with the typing experts) |
sounds good |
thanks @jorisvandenbossche |
…ttr(.. 'type') duck typing (pandas-dev#52830) * Use actual isinstance check for pyarrow.Array instead of hasattr(.. 'type') duck typing * fix import * move to cython * add docstring * try TypeGuard * remove ArrowArrayTypeGuard
…ttr(.. 'type') duck typing (pandas-dev#52830) * Use actual isinstance check for pyarrow.Array instead of hasattr(.. 'type') duck typing * fix import * move to cython * add docstring * try TypeGuard * remove ArrowArrayTypeGuard
PR for the suggestion I made at https://github.com/pandas-dev/pandas/pull/52076/files#r1160653924